Saturday, January 23, 2021

"Codd was wrong" and "You're teaching the gospel" Betray Lack of Foundation Knowledge



Note: I have documented and debunked these misconceptions so many times that I will no longer reference them -- the reader motivated to gain genuine understanding should use the (1) blog labels (2) Blogger search (3) POSTS page to locate the relevant posts.

I have long claimed that a core problem in the industry is the vast majority of practitioners who use relational terminology, do not know/understand what it means, yet are convinced they do -- the less the understanding, the greater the convinction. A recent LinkedIn exchange provided -- as if it were needed -- yet another example. It was triggered by my comment:

“How many know today that a relation is by definition in 5NF, otherwise it's not a relation, the relational algebra has "anomalies" and all bets are off? IMO, none! If you need to "do" normalization, you did not design correctly, which means you don't understand the RDM.”
that prompted the following reaction:
“Is that really true? You construct a table and fill it full of garbage. It may not even be in 1NF, but is it not still a "relation" of columns, even if it's not a relation of rows or attributes? Codd had no real conception of syntax as separate from semantics, I don't think relational theory has a clear position on this. This is where Kimball and dimensional systems differ from Codd's relational, it made some effort (not a lot) to distinguish syntactic and semantic elements.”
--Joshua Stern

------------------------------------------------------------------------------------------------------------------

SUPPORT THIS SITE
DBDebunk was maintained and kept free with the proceeds from my @AllAnalitics column. The site was discontinued in 2018. The content here is not available anywhere else, so if you deem it useful, particularly if you are a regular reader, please help upkeep it by purchasing publications, or donating. On-site seminars and consulting are available.Thank you.

LATEST UPDATES
-12/24/20: Added 2021 to the
POSTS page

-12/26/20: Added “Mathematics, machine learning and Wittgenstein to LINKS page

LATEST PUBLICATIONS (order from PAPERS and BOOKS pages)
- 08/19 Logical Symmetric Access, Data Sub-language, Kinds of Relations, Database Redundancy and Consistency, paper #2 in the new UNDERSTANDING THE REAL RDM series.
- 02/18 The Key to Relational Keys: A New Understanding, a new edition of paper #4 in the PRACTICAL DATABASE FOUNDATIONS series.
- 04/17 Interpretation and Representation of Database Relations, paper #1 in the new UNDERSTANDING THE REAL RDM series.
- 10/16 THE DBDEBUNK GUIDE TO MISCONCEPTIONS ABOUT DATA FUNDAMENTALS, my latest book (reviewed by Craig Mullins, Todd Everett, Toon Koppelaars, Davide Mauri).

USING THIS SITE
- To work around Blogger limitations, the labels are mostly abbreviations or acronyms of the terms listed on the
FUNDAMENTALS page. For detailed instructions on how to understand and use the labels in conjunction with the that page, see the ABOUT page. The 2017 and 2016 posts, including earlier posts rewritten in 2017 were relabeled accordingly. As other older posts are rewritten, they will also be relabeled. For all other older posts use Blogger search.
- The links to my columns there no longer work. I moved only the 2017 columns to dbdebunk, within which only links to sources external to AllAnalytics may work or not.

SOCIAL MEDIA
I deleted my Facebook account. You can follow me:
- @DBDdebunk on Twitter: will link to new posts to this site, as well as To Laugh or Cry? and What's Wrong with This Picture? posts, and my exchanges on LinkedIn.
- The PostWest blog for monthly samples of global Antisemitism – the only universally acceptable hatred left – as the (traditional) response to the existential crisis of decadence and decline of Western  civilization (including the US).
- @ThePostWest on Twitter where I comment on global #Antisemitism/#AntiZionism and the Arab-Israeli conflict.

------------------------------------------------------------------------------------------------------------------

 Sigh!

  • A relation (either mathematical or database -- not the same thing!) is not a table (if it were a table with columns, what are attributes?);
  • A table "full of garbage" is not a R-table -- it does not visualize even a mathematical, let alone a database relation (why?);
  • At the time Codd introduced the RDM the three levels of representation had not yet emerged and he focused almost entirely on where the RDM belongs - the logical level. But RDM is applied theory -- simple set theory expressible in first order logic (SST/FOPL) adapted to database management -- adaptation which would have not been possible and useful without a real world interpretation (semantics). Database relations are semantically constrainted such that they represent entity groups with properties and relationships within- and among groups: 
- Domains represent properties; 
- Attributes represent properties of specific types of entities (properties in groups' contexts); 
- Tuples represent (facts about -- properties) entities; 
- Constraints represent relationships within and among groups.
  • Kimball's "dimensional systems" are, essentially, application-biased database designs that have nothing to do with the semantics and logic of the RDM.

“If a table is not a relation, then what is it? This is just avoiding the issue. If 5NF were such a basic thing, there wouldn't even be a need to worry about 1234NF. 5NF does not achieve what Codd hoped for, there's not really any rigorous way to measure semantic consistency like he assumed. That is why for the most part 3NF is sufficient, and not even pure 3NF because shortcuts like identity columns are too useful to exclude, and "fact tables" that are purposely not in 3NF are important aspects of many apps.”
--Joshua Stern
Sigh!

  • A table is a table; a R-table is a table that visualizes a relation on a physical medium (in fact, only the body of a R-table, as the header is metadata) -- the tabular arrangement of rows and columns plays no role in the RDM;
  • RDM's grounding in simple set theory means that relations are simple sets (i.e., no nested sets that would require second-order logic  and lose relational advantages), which is another way of saying that they are in 1NF (simple domains with values treated as atomic by the data sublanguage);
  • Codd changed the initial version of the relational algebra (RA), including the join operator, to the current one we are familiar with (well, those who are!) The details are beyond the scope of a post, suffice it to say that the old join was to 1NF what the current join is to 5NF. In other words, given the current RA, system-guaranteeable logical validity and by-design semantic consistency require 5NF databases (which is why we now contend that relations are by definition in 5NF, otherwise they are not relations and all bets are off);
  • McGoveran's early argument that RDM' grounding in SST/FOPL required adherence to three principles of database design was never understood. Though it has not yet been proved, he believes that joint adherence produces 5NF databases. The 2-3-4NF were an historic accident (due in part to the pedagogical use of tables that induced the confusion with relations). As we explained more than once, explicit further normalization from 1NF to 5NF arises only as repair of poor designs that did not adhere to the principles.

Algebras do not have "anomalies", so database relations are in 5NF by definition (as well as in 1NF).

Several attempts were made to disabuse Stern of misconceptions (which, given my five decades of experience, I knew was futile) and, of course, he met them with his following "interpretation" of Codd's work:

“You can't quote Codd at me as a counter when I'm saying Codd was wrong. You're just preaching the gospel and I'm telling you it's wrong. I know what I'm saying, but I suspect you're not listening ... The point of Codd's work was to give us something useful, which requires a physical realization. His error was to hold that there was some logic that could move around physical data without worrying about semantics, once it was in 3NF/5NF. But he ignored the *cost* which is determined by the physical implementation. So much of actual database administration is correlating physical form with semantics, in schema and in queries. I don't blame Codd for not solving all these issues, but every working DBA has to know things that Codd never worried about.  
Sigh!
As a good rule of thumb, "Codd was wrong" and "did not solve the problem...", "you're preaching the gospel" are dead giveaways. Yes, there were some Codd errors, and we've written about them (e.g., four-valued logic that inspired NULL), but that must be allowed for given his pioneering effort and huge contribution. The chance, however, that practitioners will identify them correctly is almost nil. Sure enough, the paragraph reveals lack of understanding of levels of representation and data independence, and, thus, of RDM.

  • Codd did not "ignore" physical implementation (how could he, particularly given IBM's pressure?). The RDM rendered logical design -- and, thus applications -- independent of it, such that they are unimpaired by physical changes for performance optimization purposes. Stern has it upside down and backwards: physical independence reduces, not increases the "cost" of implementation, by giving the DBA much more flexibility.
  • RDM -- the adaptation and application of SST/FOPL to database management -- would have been neither possible, nor useful without a real world interpretation: Why else would integrity -- semantic constraints -- be a component of the RDM? Upside down and backwards again: conceptual models (semantics) are formalized using the RDM as logical models for database representations precisely so that they can be computerized.

“Codd made a *huge* assumption that some sort of unique ID was NOT an essential, necessary aspect of his model, and to a large degree he was wrong. The misunderstanding was Codd's. The identity key is two things, first it is a numbering or ordering of the rows, not required of Codd's relational model because he based it on set theory where sets are generally not ordered, order is something that is done externally. He thought that since each row has a unique primary key, that would be sufficient, but a PK can be changed without changing the real-world entity, it is all more complex than he allowed. Second it is something to preserve that real-world identity across such changes. It is also a handy surrogate key since the unique PK in many cases is long enough to demand one for reasons of efficiency. Codd never thought about efficiency, but just that surrogacy can deliver 100:1 improvement in many common cases.”
The misunderstanding is all Stern's -- so much so that I responded "When you're in a hole, stop digging". I don't see much value in debunking the same misconceptions over and over again. Since I have written extensively about keys, I refer the reader to my paper, as well as to the various posts on the subject.





No comments:

Post a Comment

View My Stats